This document is intended for audiences internal to the project team for the Boulder City, CO Guaranteed Income project (BGI). The data presented here is simulated or uses publicly available estimates for certain parameters (meaning there are no PII, IRB, or proprietary concerns with this document, but it should not be circulated beyond the core project team).
For team members interested in the process for weighting and selection the key sections are:
For team members who are primarily interested in seeing an example of the spreadsheet that tracks each sampling wave, please see the Appendix in Section 6. The first table is the condensed version that shows just the IDs for each wave. The second table is the expanded version that shows the demographic characteristics for the individuals in each wave. The data can be downloaded via the clipboard or a csv file.
Steps in the weighting process:
1st) Simulate population dataset based on questions from recruitment form. The simulated dataset is for illustrating the process. It represents an estimate of the total population of those living in 30 - 60% AMI in Boulder City.
2nd) Randomly sample 4000 applicants from the simulated population data. Our interest is in evaluating what a weighted sample of this applicant pool looks like.
3rd) Select first ‘wave’ of 200 program selections using two methods: A - a custom weighting procedure B - a purely random sample of 200 selections from the applicant pool
4th) Select second and third waves using propensity score matching against the applicant pool ((Ho et al. 2011))1.
Ideally, make all matches based on estimates of population in Boulder City who are either a) between 30 and 60 % of area median income (AMI) or b) below poverty line.
Proportionate match by race/ethnicty, gender identity, and disability status.
Individuals with children under 18 should be represented in the program at ~2xs their representation in the application pool.
The eligibility questionnaire will have questions on each of the above, plus additional eligibility and other characteristics not addressed here.
Ethnicity/race options:
Hispanic/Latino
Black or African American
White (not Latino)
Asian
2 or more races
Not listed
Native Hawaiian/Pacific Islander
American Indian/Alaskan Native
Gender:
Woman
Man
Transgender
Prefer to self identify (please write in your preferred identity here)
Households with children under 18
Yes
No
Disability status:
This table shows the probabilities that we are working with in the current iteration of our fake data. These are based on empirical estimates.
Race and ethnicity: estimates derived from this City of Boulder online source. The poverty measure is the US census bureau’s definition of poverty.
Gender: estimates derived from the Williams Institute and are based on the entire state of CO. We only have estimates for percent transgender.
Children in household: estimates will be based on the applicant pool (the values in the table are placeholders).
Disability: studies have shown that about 25% of the population is disabled at any give time and we will use this as our background expectation for Boulder City.
| sub_group | target_props |
|---|---|
| race_ethnicity | |
| White (not latino) | 0.756 |
| Hispanic | 0.100 |
| Black or African American | 0.014 |
| Asian | 0.051 |
| American Indian or Alaska Native | 0.002 |
| Native Hawaiian or Other Pacific Islander | 0.001 |
| Not Listed | 0.021 |
| Two or more | 0.055 |
| gender | |
| Woman | 0.440 |
| Man | 0.440 |
| Transgender | 0.060 |
| Prefer to self identify | 0.060 |
| child_household | |
| No | 0.800 |
| Yes | 0.200 |
| disability | |
| No | 0.750 |
| Yes | 0.250 |
Start with data represents a loosely informed estimate of the ‘total population’.
A few example rows from the simulated population sample:
| id | race_ethnicity | gender | child_household | disability |
|---|---|---|---|---|
| 19526 | White (not latino) | Man | No | Yes |
| 2295 | White (not latino) | Woman | No | No |
| 6964 | White (not latino) | Man | No | No |
| 7717 | White (not latino) | Prefer to self identify | Yes | Yes |
| 519 | White (not latino) | Man | Yes | No |
| 24333 | White (not latino) | Man | No | No |
| 16499 | White (not latino) | Woman | No | No |
| 607 | White (not latino) | Man | Yes | No |
| 18385 | Asian | Man | No | No |
| 11269 | White (not latino) | Man | No | Yes |
Randomly select 4000 from the population.
| sub_group | count | proportions | target_proportions |
|---|---|---|---|
| child_household | |||
| No | 3274 | 0.819 | 0.800 |
| Yes | 726 | 0.182 | 0.200 |
| disability | |||
| No | 2979 | 0.745 | 0.750 |
| Yes | 1021 | 0.255 | 0.250 |
| gender | |||
| Man | 1727 | 0.432 | 0.440 |
| Prefer to self identify | 259 | 0.065 | 0.060 |
| Transgender | 242 | 0.060 | 0.060 |
| Woman | 1772 | 0.443 | 0.440 |
| race_ethnicity | |||
| American Indian or Alaska Native | 6 | 0.002 | 0.002 |
| Asian | 185 | 0.046 | 0.051 |
| Black or African American | 58 | 0.015 | 0.014 |
| Hispanic | 415 | 0.104 | 0.100 |
| Native Hawaiian or Other Pacific Islander | 7 | 0.002 | 0.001 |
| Not Listed | 82 | 0.020 | 0.021 |
| Two or more | 231 | 0.058 | 0.055 |
| White (not latino) | 3016 | 0.754 | 0.756 |
Note: as a reminder/clarifier, in the above table the ‘proportions’ column is what we observe when we select 4000 rows/individuals from our simulated population data. The target_proportions are the values used to simulate the population data.
To select the first sample wave of 200 individuals from our 4000 applicant pool we first take a custom weighted sample of the data using the target proportions in Table 5.3.
The customized weighting procedure:
calculate the expected number of individuals in a sample of 200 if they were in the sample at exactly their expected proportions.
For any individuals that have expected counts <= 3, add three to their expected count. This is another way of increasing proportionate representation of rare characteristics.
Reserve 25% of the target sample size of 200 and for individuals with rare characteristics. These are defined by examining the applicant pool and simply counting the characteristics of all the people in the pool. A random sample of 50 (25% of
The remaining 75% are chosen by a simple weighting from the enrollee pool.
Lastly, if any characteristics are present in the enrollee pool but still missing the selected sample, select one person at random with that characteristic and replace someone chosen at random with the most common set of characteritics.
The target proportions in Table 5.3 are based on characteristics of participants, so this first step in the sampling selects more than 200. We then select 200 people for the first sampling wave using the procedure just described.
| sub_group | props | target_counts | count_rand | proportions_rand | count_w | proportions_w |
|---|---|---|---|---|---|---|
| race_ethnicity | ||||||
| American Indian or Alaska Native | 0.002 | 1 | NA | NA | 3 | 0.014925373 |
| Native Hawaiian or Other Pacific Islander | 0.001 | 1 | NA | NA | 1 | 0.004975124 |
| Black or African American | 0.014 | 3 | 2 | 0.010 | 5 | 0.024875622 |
| Not Listed | 0.021 | 4 | 2 | 0.010 | 8 | 0.039800995 |
| Asian | 0.051 | 10 | 13 | 0.065 | 18 | 0.089552239 |
| Two or more | 0.055 | 11 | 11 | 0.055 | 21 | 0.104477612 |
| Hispanic | 0.100 | 20 | 23 | 0.115 | 18 | 0.089552239 |
| White (not latino) | 0.756 | 151 | 149 | 0.745 | 127 | 0.631840796 |
| gender | ||||||
| Transgender | 0.060 | 12 | 14 | 0.070 | 17 | 0.084577114 |
| Prefer to self identify | 0.060 | 12 | 11 | 0.055 | 26 | 0.129353234 |
| Man | 0.440 | 88 | 84 | 0.420 | 53 | 0.263681592 |
| Woman | 0.440 | 88 | 91 | 0.455 | 105 | 0.522388060 |
| child_household | ||||||
| Yes | 0.200 | 40 | 40 | 0.200 | 48 | 0.238805970 |
| No | 0.800 | 160 | 160 | 0.800 | 153 | 0.761194030 |
| disability | ||||||
| Yes | 0.250 | 50 | 47 | 0.235 | 48 | 0.238805970 |
| No | 0.750 | 150 | 153 | 0.765 | 153 | 0.761194030 |
NOTE: A visual inspection of the above table will often suggest that low sample size groups are being dropped from the random (non-weighted) selection procedure. The inadvertent dropping of some rare characteristics can happen and this highlights an advantage of the customized weighted selection procedure. However, another approach could be to take lots of random samples (like, 30) and then average across them. This would at least prevent rare groups from being complete dropped.
Discussion points could center on how ‘individuals with rare characteristics’ are defined. Currently these are simply the lower half of the characteristics ranked by frequency, but other definitions are certainly possible. Another discussion point could be the fraction of the final sample reserved for individuals with rare characteristics. 25% is a reasonable starting point but this could be increased or decreased.
The second wave selection works by taking the wave 1 selection and then using an algorithm to find each individual’s closest match from the 3800 individuals remaining in the applicant pool. This is done using a technique called propensity score matching (Ho et al. 2011).
The third wave of sampled individuals is done with the same process.
First, compare the population data to the applicant data:
Figure 5.1: Proportions by race group in simulated population data.
Figure 5.2: Proportions by gender in simulated population data.
We can examine just the race and gender breakdowns, above, to see that randomly sampling 4000 individuals from our population of 25000 leads to proportions in each group that are fairly similar.
Next, we can see how the proportions in each sampling wave compare to the ‘target’ proportions in the population data.
First with a table showing counts and proportions for each sampling wave compared to the population.
| sub_group | target_props | target_counts | count_w1 | props_w1 | count_w2 | props_w2 | count_w3 | props_w3 |
|---|---|---|---|---|---|---|---|---|
| race_ethnicity | ||||||||
| American Indian or Alaska Native | 0.002 | 1 | 3 | 0.015 | 3 | 0.015 | NA | NA |
| Native Hawaiian or Other Pacific Islander | 0.001 | 1 | 1 | 0.005 | NA | NA | NA | NA |
| Black or African American | 0.014 | 3 | 5 | 0.025 | 6 | 0.030 | 7 | 0.035 |
| Not Listed | 0.021 | 4 | 8 | 0.040 | 9 | 0.045 | 10 | 0.050 |
| Asian | 0.051 | 10 | 18 | 0.090 | 19 | 0.095 | 19 | 0.095 |
| Two or more | 0.055 | 11 | 21 | 0.104 | 20 | 0.100 | 21 | 0.104 |
| Hispanic | 0.100 | 20 | 18 | 0.090 | 17 | 0.085 | 17 | 0.085 |
| White (not latino) | 0.756 | 151 | 127 | 0.632 | 127 | 0.632 | 127 | 0.632 |
| gender | ||||||||
| Transgender | 0.060 | 12 | 17 | 0.085 | 17 | 0.085 | 16 | 0.080 |
| Prefer to self identify | 0.060 | 12 | 26 | 0.129 | 26 | 0.129 | 29 | 0.144 |
| Man | 0.440 | 88 | 53 | 0.264 | 54 | 0.269 | 54 | 0.269 |
| Woman | 0.440 | 88 | 105 | 0.522 | 104 | 0.517 | 102 | 0.507 |
| child_household | ||||||||
| Yes | 0.200 | 40 | 48 | 0.239 | 48 | 0.239 | 48 | 0.239 |
| No | 0.800 | 160 | 153 | 0.761 | 153 | 0.761 | 153 | 0.761 |
| disability | ||||||||
| Yes | 0.250 | 50 | 48 | 0.239 | 47 | 0.234 | 47 | 0.234 |
| No | 0.750 | 150 | 153 | 0.761 | 154 | 0.766 | 154 | 0.766 |
Figure 5.3: Proportions by racial grouping, sampling waves.
Figure 5.4: Proportions by gender, sampling waves.
Figure 5.5: Proportions of households with a child in the home, by sampling wave.
Figure 5.6: Proportions by disability status, sampling wave.
The first example dataset presents a column for each sampling wave. The intended use is that all the individuals in the far left column, Wave 1, are selected to the program for verification. If some of these individuals cannot be verified, their replacement is the cell in the same row immediately to the right, in the Wave 2 column. If someone in Wave 2 cannot be verified, then proceed to Wave 3.
These are for illustration and to aid in thinking through the tracking process. A strategy, yet to be defined, will be used for selecting cases beyond the first three sampling waves.
Propensity score matching is a technique often used in quasi-experimental designs for statistically matching members of a treatment group to members of a control group. In our case, we use the same kind of algorithm to match each participant in sampling waves 2 and 3 with their most similar counter part in the applicant pool.↩︎